ABSTRACT
We are interested in the following general question: is it possible to abstract knowledge that is generated while learning the solution of a problem, so that this abstraction can accelerate the learning process? Moreover, is it possible to transfer and reuse the acquired abstract knowledge to accelerate the learning process for future similar tasks? We propose a framework for conducting simultaneously two levels of reinforcement learning, where an abstract policy is learned while learning of a concrete policy for the problem, such that both policies are refined through exploration and interaction of the agent with the environment. We explore abstraction both to accelerate the learning process for an optimal concrete policy for the current problem, and to allow the application of the generated abstract policy in learning solutions for new problems. We report experiments in a robot navigation environment that show our framework to be effective in speeding up policy construction for practical problems and in generating abstractions that can be used to accelerate learning in new similar problems.
- C. Boutilier, R. Dearden, and M. Goldszmidt. Stochastic dynamic programming with factored representations. Artificial Intelligence, 121(1--2):49--107, 2000. Google ScholarDigital Library
- D. D. Castro, A. Tamar, and S. Mannor. Policy gradients with variance related risk criteria. In Proc. 29th Int. Conf. on Machine Learning (ICML-12), pages 935--942, New York, NY, USA, 2012. Omnipress.Google Scholar
- V. F. da Silva, F. d. A. Pereira, and A. H. R. Costa. Finding memoryless probabilistic relational policies for inter-task reuse. In Proc. 14th Int. Conf. on Information Processing and Management of Uncertainty - IPMU'12, volume 298, pages 107--116. Springer Berlin Heidelberg, 2012.Google ScholarCross Ref
- T. Degris, M. White, and R. Sutton. Off-policy actor-critic. In Proc. 29th Int. Conf. on Machine Learning (ICML-12), abs/1205.4839, New York, NY, USA, 2012. Omnipress.Google Scholar
- M. Deisenroth and C. Rasmussen. Pilco: A model-based and data-efficient approach to policy search. In Proc. 28th Int. Conf. on Machine Learning (ICML-11), pages 465--472, New York, NY, USA, 2011. ACM.Google Scholar
- F. Fernández, J. Garcıa, and M. Veloso. Probabilistic Policy Reuse for inter-task transfer learning. Robotics and Autonomous Systems, 58(7):866--871, July 2010. Google ScholarDigital Library
- F. Fernández and M. Veloso. Probabilistic policy reuse in a reinforcement learning agent. Proc. 5th Int. Joint Conf. on Autonomous Agents and Multiagent Systems - AAMAS '06, pages 720--727, 2006. Google ScholarDigital Library
- R. Gaudel and M. Sebag. Feature selection as a one-player game. In Proc. 27th Int. Conf. on Machine Learning (ICML-10), pages 359--366. Omnipress, 2010.Google ScholarDigital Library
- A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. How. Online discovery of feature dependencies. In Proc. 28th Int. Conf. on Machine Learning (ICML-11), pages 881--888, New York, NY, USA, 2011. ACM.Google Scholar
- G. Konidaris and A. Barto. Skill discovery in continuous reinforcement learning domains using skill chaining. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1015--1023. 2009.Google Scholar
- A. Lazaric and M. Ghavamzadeh. Bayesian multi-task reinforcement learning. In Proc. 27th Int. Conf. on Machine Learning (ICML-10), pages 599--606. Omnipress, 2010.Google ScholarDigital Library
- M. L. Littman. Memoryless policies: theoretical limitations and practical results. In 3rd Int. Conf. on Simulation of Adaptive Behavior: from animals to animats 3, pages 238--245. MIT Press, 1994. Google ScholarDigital Library
- T. Matos, Y. P. Bergamo, V. F. da Silva, and A. H. R. Costa. Simultaneous Abstract and Concrete Reinforcement Learning. In Proc. 9th Symposium of Abstraction, Reformulation, and Approximation - SARA'11, pages 82--89. AAAI Press, 2011.Google Scholar
- M. V. Otterlo. Reinforcement learning for relational MDPs. In Machine Learning Conference of Belgium and the Netherlands, pages 138--145, 2004.Google Scholar
- M. V. Otterlo. The Logic of Adaptative Behaviour. IOS Press, Amsterdam, 2009.Google Scholar
- C. Painter-Wakefield and R. Parr. Greedy algorithms for sparse reinforcement learning. In Proc. 29th Int. Conf. on Machine Learning (ICML-12), pages 1391--1398, New York, NY, USA, 2012. Omnipress.Google Scholar
- J. Pazis and R. Parr. Generalized value functions for large action sets. In Proc. 28th Int. Conf. on Machine Learning (ICML-11), pages 1185--1192, New York, NY, USA, 2011. ACM.Google Scholar
- M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., 1994. Google ScholarDigital Library
- F. Stulp and O. Sigaud. Path integral policy improvement with covariance matrix adaptation. In Proc. 29th Int. Conf. on Machine Learning (ICML-12), abs/1206.4621, New York, NY, USA, 2012. Omnipress.Google Scholar
- Y. Sun, F. Gomez, M. Ring, and J. Schmidhuber. Incremental basis construction from temporal difference error. In Proc. 28th Int. Conf. on Machine Learning (ICML-11), pages 481--488, New York, NY, USA, 2011. ACM.Google Scholar
- A. Tamar, D. D. Castro, and R. Meir. Integrating partial model knowledge in model free RL algorithms. In Proc. 28th Int. Conf. on Machine Learning (ICML-11), pages 305--312, New York, NY, USA, 2011. ACM.Google Scholar
Index Terms
- Speeding-up reinforcement learning through abstraction and transfer learning
Recommendations
Learning relational options for inductive transfer in relational reinforcement learning
ILP'07: Proceedings of the 17th international conference on Inductive logic programmingIn reinforcement learning problems, an agent has the task of learning a good or optimal strategy from interaction with his environment. At the start of the learning task, the agent usually has very little information. Therefore, when faced with complex ...
Using Transfer Learning to Speed-Up Reinforcement Learning: A Cased-Based Approach
LARS '10: Proceedings of the 2010 Latin American Robotics Symposium and Intelligent Robotics MeetingReinforcement Learning (RL) is a well-known technique for the solution of problems where agents need to act with success in an unknown environment, learning through trial and error. However, this technique is not efficient enough to be used in ...
Big data, lifelong machine learning and transfer learning
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data miningA major challenge in today's world is the Big Data problem, which manifests itself in Web and Mobile domains as rapidly changing and heterogeneous data streams. A data-mining system must be able to cope with the influx of changing data in a continual ...
Comments